Blog Classification with Co-training

نویسندگان

  • Joshua Gerrish
  • Vahed Qazvinian
  • Xiadong Shi
چکیده

In this project we use co-training to classify blogs by political leaning. We classify them into two pre-defined categories: liberal and conservative. We examine the performance of co-training versus normal supervised learning. We also look at using several different features to improve training.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leveraging Interactive Knowledge and Unlabeled Data in Gender Classification with Co-training

Conventional approaches to gender classification much rely on a large scale of labeled data, which is normally hard and expensive to obtain. In this paper, we propose a co-training approach to address this problem in gender classification. Specifically, we employ both non-interactive and interactive texts, i.e., the message and comment texts, as two different views in our cotraining approach to...

متن کامل

Filling the Gap: Semi-Supervised Learning for Opinion Detection Across Domains

We investigate the use of Semi-Supervised Learning (SSL) in opinion detection both in sparse data situations and for domain adaptation. We show that co-training reaches the best results in an in-domain setting with small labeled data sets, with a maximum absolute gain of 33.5%. For domain transfer, we show that self-training gains an absolute improvement in labeling accuracy for blog data of 16...

متن کامل

Topic Classification of Blog Posts Using Distant Supervision

Classifying blog posts by topics is useful for applications such as search and marketing. However, topic classification is time consuming and error prone, especially in an open domain such as the blogosphere. The state-of-the-art relies on supervised methods, requiring considerable training effort, that use the whole corpus vocabulary as features, demanding considerable memory to process. We sh...

متن کامل

Blog Classification Using Tags: An Empirical Study

With an exponential growth of Weblogs (or blogs), many blog directories have appeared to help users to locate topical blogs. As tags are commonly used to describe blogs, we study the effectiveness of tags in blog classification. Compared with titles and descriptions, our experiments, using 24,247 blogs, showed that tags could lead to better classification accuracy. It is interesting to observe ...

متن کامل

Semi-Supervised Learning for Blog Classification

Blog classification (e.g., identifying bloggers’ gender or age) is one of the most interesting current problems in blog analysis. Although this problem is usually solved by applying supervised learning techniques, the large labeled dataset required for training is not always available. In contrast, unlabeled blogs can easily be collected from the web. Therefore, a semi-supervised learning metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007